Descents and nodal load in scale-free networks 
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The load of a node in a network is the total traffic going through it when every node pair sustains 
a uniform bidirectional traffic between them on shortest paths. We show that nodal load can be 
expressed in terms of the more elementary notion of a node's descents in breadth-first-search (BFS or 
shortest-path) trees, and study both the descent and nodal-load distributions in the case of scale-free 
networks. Our treatment is both semi-analytical (combining a generating-function formalism with 
simulation-derived BFS branching probabilities) and computational for the descent distribution; it 
is exclusively computational in the case of the load distribution. Our main result is that the load 
distribution, even though it can be disguised as a power-law through subtle (but inappropriate) 
binning of the raw data, is in fact a succession of sharply delineated probability peaks, each of 
which can be clearly interpreted as a function of the underlying BFS descents. This find is in stark 
contrast with previously held belief, based on which a power law of exponent —2.2 was conjectured 
to be valid regardless of the exponent of the power-law distribution of node degrees. 



PACS numbers: 89.20.Hh, 89.75.Da, 89.75.Fb, 89. 75. He 



I. INTRODUCTION 

In a scale- free network, node connectivities (or degrees) 
are distributed according to a power law, that is, the 
probability that a randomly chosen node has degree k is 
proportional to k~ T for some r > 1. Scale-free networks 
are therefore strictly diverse from networks of the classic 
Erdos-Renyi type [ij , in which node degrees are Poisson- 
distributed. The importance of scale-free networks in 
various natural, social, and technological settings (the 
latter encompassing now ubiquitous structures such as 
the Internet and the WWW) has motivated considerable 
research along several fronts during the last decade. For 
the main results that have been attained the reader is 
referred to @ and to the chapters in [1, 0| ■ 

Most of these research efforts have been focused on ei- 
ther extracting a scale-free network structure out of data 
on some particular domain, or the creation of mecha- 
nisms of network evolution to function as generative mod- 
els of such networks. As a consequence, it seems fair to 
state that so far the greatest thrust has been directed 
toward what may be called the "syntactic" aspects of 
scale-free networks, as opposed to their "semantic" (or 
"functional") aspects, these being related to the higher 
processes, either natural or artificial, that depend on the 
underlying networks as a substrate. In the case of com- 
puter networks, for example, this issue is illustrated by 
the networks' topological properties, on the one hand, 
and their utilization (for end-to-end communication pro- 
tocols, data storage and retrieval, etc.), on the other. 

Still in the context of computer networks, exceptions 
to the research trend just mentioned can be found in the 
works reported in 0,[a,0|) ai l concerned with the efficient, 
global dissemination of information through the nodes of 
a network. The common thread that runs through all 
three of them is that degree-based local heuristics ex- 
ist for forwarding information through the nodes of the 



network so that, globally, good statistical properties are 
achieved (such as expecting delivery to occur for most 
nodes, for example). However, when disseminating infor- 
mation globally is the goal, we find that designing heuris- 
tics based on node degrees, even though meritorious by 
their eminently local nature, is somewhat lacking in plau- 
sibility, since important performance-related notions, like 
locally available bandwidth and and node congestion, for 
example, remain inadequately accounted. 

We see, then, that even as we move from the merely 
topological aspects of a network toward its higher-level, 
functional aspects, there remain entities that make up a 
node's set of local characteristics (e.g., node congestion) 
which ultimately can be understood as originating higher 
up at more abstract levels (e.g., the protocols that steer 
information this way or that as it moves through the net- 
work). Clearly, understanding such entities seems to be 
one of the fundamental keys to better design decisions at 
the upper levels. And even though the setting of com- 
puter networks provides good examples here, note that 
very similar issues are present in other contexts, such as 
that of networks representing road or street maps and, in 
fact, any other network where end-to-end flows of some 
sort intersect one another. 

In this paper we study the load of a node in a scale- 
free network. This property was originally introduced 
and analyzed in Q and gives, for the node in question, 
the total communication demand on that node when all 
node pairs sustain a uniform, bidirectional message traf- 
fic between them on shortest paths. Clearly, the load of 
a node is one of the aforementioned entities, bridging the 
various levels of abstraction at which the network may be 
analyzed. The study in Q is essentially based on simula- 
tions and ends with the conjecture that nodal load is dis- 
tributed as a power law whose exponent is invariant with 
respect to t in the range (2, 3]. We follow a different ap- 
proach, providing both a semi-analytical treatment and 
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results from computational simulations. As we discuss in 
the sequel, we have found that nodal-load distribution in 
the scale-free case is richly detailed in a way that can be 
understood by resorting to appropriate graph-theoretic 
concepts, such as breadth-first-search (BFS) trees and 
descents. This contrasts sharply with the purported na- 
ture of such a distribution as a power law, and also with 
the conjecture of a universal exponent. 



II. DESCENTS AND NODAL LOAD 

We conduct our study entirely on undirected random 
graphs whose degrees are distributed as a power law. 
Also, in order to avoid any spurious effects resulting from 
the existence of node pairs joined by no path at all, we 
concentrate exclusively on each graph's giant connected 
component (GCC), which for r < 3.47 is guaranteed to 
exist • For the sake of the analysis in this section, we 
then assume that G is a connected undirected graph. We 
let n be the number of nodes in G. 

Shortest paths in G arc intimately connected with the 
graph's so-called BFS trees [9]. For each node r of G, 
a BFS tree of G rooted at r spans all of G's nodes and 
results from the process of visiting all nodes, beginning 
at r, in the following manner. First all neighbors of r are 
placed in a queue. Then we repeatedly mark the node at 
the head of the queue as visited, add its neighbors that 
are not already in the queue to the tail of the queue, 
and remove it from the queue. This is repeated until the 
queue becomes empty. If i is the head-of-the-queue node 
when its neighbor j is appended to the queue, then a tree 
edge is created between i and j. At the end, the resulting 
tree comprises exactly one path from r to each other 
node, and this path is shortest. Of course, depending on 
the order of addition of a node's neighbors to the queue, 
multiple BFS trees may exist for the same root r, and 
consequently multiple shortest paths from r to each of 
the other nodes. 

Let t r be the number of distinct BFS trees rooted at r 
and T 1 , . . . , T^ r the trees themselves. If T* is one of these 
trees, then we define the descent of node i in T*, denoted 
by as the number of nodes in the sub-tree of 

rooted at i. This definition is also valid for i = r and 
includes i in its own descent [thus t^.(i) = n if i = r and 
dr(i) = 1 i if is a leaf in T']. We see that, by definition, 
d\, (i) is the number of shortest paths on T£ that lead from 
r to some other node through node i. 

A node's descents are then related to its load. Assum- 
ing, as we do henceforth, that the notion of load includes 
traffic from the node in question to itself, then one pos- 
sibility for expressing the load of node i in terms of its 
descents might seem to be to write it as J2r=i Y^t=i ^' W- 
Notice, however, that this would make each pair of nodes 
weight in the load of node i in proportion to the number 
of distinct shortest paths between them going through i, 
which is not acceptable: the definition of load refers to 
uniform traffic between all node pairs, meaning that the 



traffic between pairs interconnected by multiple shortest 
paths is distributed among those paths. 

In order to avoid this distortion and still be able to do 
some mathematical analysis, we consider node i's average 
descent in trees If, . . . , T**", denoted by d r (i), and sub- 
stitute it for X^Li d\.{i) in the previous expression. Since 
dr(i) — J2l=i dl(i)/t r , this corresponds to assuming that 
each of the multiple shortest paths between a node pair 
carries the same fraction of the total traffic between the 
two nodes. If £(i) is the load of node i, the approximation 
we use is then 

n 

=J2d r (i). (1) 

r=l 

As we move to the setting of the GCC of a random 
graph whose degrees are power-law distributed, even a 
relation as simple as the one in Eq. ([TJ) on the corre- 
sponding random variables is of little help, since a node's 
descents in the various BFS trees are not independent of 
one another. For this reason, in the remainder of this sec- 
tion we limit ourselves to pursuing the relatively simpler 
goal of analyzing the descent distribution of a randomly 
chosen node in a randomly chosen BFS tree. 

If i and r are such a node and the root of such a tree, 
respectively, and if i has Cj immediate descendants on the 
tree, then clearly 

dr ® = {l+Y,-Lidr(j), if 2 > 0' ( 2 ) 

In the case of formally infinite n, it is possible to model 
descents via the branching process whose branching prob- 
abilities are given by the distribution of immediate de- 
scendants on the tree. If such a distribution is Poisson, 
for example, then descents can be found to be distributed 
according to the Borel distribution [l(| • Other examples 
include a generalization of the Poisson case, yielding a 
generalization of the Borel distribution . The branch- 
ing probabilities of interest to us, however, are of diffi- 
cult analytical determination (cf. Section IIII[) , and for 
this reason, unlike the Poisson case or its aforementioned 
generalization, there is little hope of determining the de- 
scent distribution as a closed-form expression. Even so, 
some analytical characterization remains within reach. 

For c > and d > 1, let P c and Qd be, respectively, the 
probabilities that a randomly chosen node has c imme- 
diate descendants and descent equal to d in a randomly 
chosen tree. Let the corresponding generating functions 
be V{x) and Q(x), that is, 

V(x)=Y / PcX c (3) 

c>0 

and 

Q(x)=J2QdX d . (4) 

d>i 
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Considering Eq. ([2]), and by well-known properties of 
probability generating functions fl2l . [l3| , we have 



Q(x) = xV{Q{x)), 



(5) 



where the x factor compensates for the fact that the sum 
in Eq. (UJ) starts at d = 1 (instead of d = 0) — thus ac- 
counting for the 1 summand in Eq. ([2]) — and T'(Q{x)) is 
the generating function of the distribution of the sum of 
a P c -distributed number of independent, Q^-distributed 
random variables. 

In order to continue with the determination of each 
Qd, we proceed in the same manner as [HI [ll|, based 
on the approach of [l4|. First we let q = Q(x), so that 
Eq. §5§ becomes x = f(q) = q/V(q), and define g(q) = q. 
Then we apply Lagrange's expansion fl5j directly: for 
/'(0) ^ (which we assume) and g(q) infinitely differ- 
entiable (which it is), g can be expressed as the power 
series in x given by 



d>\ 



X 
~d] 



d 



d-1 



dq 



d-l 



9'(q) 



9 



/(<?), 



9=0 



Comparing Eqs. ([U and ©, in turn, yields 



(6) 




J 9=0 



9=0 



(7) 



9=0 



where, by a well-known equality [161 ], 



I? 



if m = 0; 



\ (1/mPo) £I=i - m + l)PiR m - h if m > 
After careful (but tedious) calculation, we obtain 

Pd-l 



(8) 
(9) 



III. COMPUTATIONAL METHODOLOGY 

We use n = 1 000 in all our simulations. The reason for 
such a relatively modest value of n is that, for statistical 
significance, sufficiently many repetitions are needed for 
each of the three sources of randomness. These are: the 
number of graphs for each value of r (we use 10 000), the 
number of roots for each graph (we use all nodes in the 
graph's GCC, whose number we denote simply by nccc 
even though it depends on the graph), and the number 
of BFS trees for each root (we use 50). For each value 
of r, the two distributions of interest (viz. the descent 
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FIG. 1: (Color online) Distributions of GCC sizes (riGCc)- 



distribution and the nodal-load distribution) can be ob- 
tained by computing descents and accumulating them as 
needed to yield the nodal loads as in Eq. (fT]). 

Each graph is generated in the following manner. First 
we sample a degree for each of the n nodes from the 
power-law degree distribution (this is repeated until a 
realizable degree sequence turns up, i.e., one whose de- 
grees sum up to an even value). Then node pairs are 
selected uniformly at random from the pool of nodes 
whose degrees are not yet exhausted by previous con- 
nections and a new edge is created between the nodes in 
each pair. This method may occasionally generate self- 
loops or multiple edges between the same two nodes, but 
it remains the method of our choice because it deploys 
edges independently of one another, which conforms to 
the independence assumption behind Eq. ([5]). 

The fact that we are constrained to operating within 
each graph's GCC has to be taken into account care- 
fully, since for the larger values of r, nccc tends to be 
distributed around a lower mean and more widely, as il- 
lustrated in Fig. [TJ The consequences of this are twofold. 
First, as demonstrated in 0, a random graph's degree 
distribution is not preserved when conditioned upon the 
nodes' being part of the graph's GCC; so, even though 
we generate the graph from a scale-free degree distribu- 
tion, such a property is not guaranteed to hold within the 
GCC. Secondly, the analytical prediction of the descent 
distribution embodied in Eq. Q is the result of assum- 
ing a formally infinite number of nodes [if not, then once 
again the independence assumption underlying Eq. ([5]) 
makes little sense], which is clearly an ever cruder as- 
sumption as t increases and the GCC decreases. 

Another source of difficulties concerning Eq. ([9]) is that 
it depends on the distribution of a node's immediate de- 
scendants on BFS trees (i.e., P c for c > 0), which to our 
knowledge cannot be determined analytically with satis- 
factory correctness or accuracy (20|. What we do is to 
resort to simulation data to fill in for this distribution, 
but even this has to be approached carefully, for reasons 
that are apparent in Fig. [21 In this figure, the distribu- 
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FIG. 2: (Color online) Distribution of immediate BFS de- 
scents. 
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FIG. 3: (Color online) Descent distributions. Solid lines give 
the analytical predictions of Eq. (|9} . Abscissae are normalized 
to 71gcc and binned. 



tion of immediate BFS descendants within the GCC is 
shown for three values of t and two values of n. For fixed 
r, the distribution seems to be the same (except for vari- 
ations due to finite-size effects) for both n — 1 000 and 
n = 10 000. So, although all our simulations are carried 
out for the smaller of these values of n, we use simulation 
data relative to the larger one, since the effects of finite 
n only become manifest for significantly higher degrees. 

We remark, in addition, that this use of simulation 
data in lieu of the distribution called for in Eq. §§§ may 
itself be prone to severe inaccuracy because of the already 
mentioned dependency on r of the GCC-size distribution. 
For the larger values of r, the fact that GCC sizes are 
widely varying implies that any number giving a node's 
immediate BFS descent is necessarily highly dependent 
on the size of the current GCC. Ideally, we should ex- 
press such numbers as fractions of nccc (as in fact we 
do in Section IIVI for other quantities), but this would 
require — in place of Eq. @ — an expression in terms of 
such fractions as well. Regrettably, we have no such ex- 
pression just yet. 



IV. COMPUTATIONAL RESULTS AND 
DISCUSSION 

Our computational results are summarized in Figs. [3] 
and |4] for five values of r in the interval [2, 3]. Fig. [3] gives 
the descent distributions and also their analytical predic- 
tions as given by Eq. Since no descent value is larger 
than the GCC size (nccc) for the graph in question, all 
data are shown normalized to the appropriate nocc '■ sim- 
ulation data are normalized to the corresponding GCC 
sizes occurring during the simulation, and analytical data 
to the mean GCC size for the r value at hand. 

Notice that all simulated probabilities accumulate sig- 
nificantly at the largest possible normalized descent. 
While this is clearly due to the finiteness of n, for r < 2.75 
it also indicates that, had we been able to afford substan- 
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FIG. 4: (Color online) Load distributions. Solid lines give 
power laws of exponent —2.2. Abscissae are normalized to 
n GCC an d binned. 



tially larger values of n, we could expect this accumu- 
lated probability to spread through values of normalized 
descent one to two orders of magnitude below the max- 
imum and make the simulation data agree with the an- 
alytical predictions ever more closely from below. As we 
discussed in the previous section, this is in good agree- 
ment with the limitations we expect Eq. ^ to have for 
relatively small values of n. As for the remaining value of 
t (r — 3), recall that in this case the effect of relatively 
small n is considerably severer, since nccc has a very 
low mean and is also very widely spread. So, while we 
may still expect good agreement between simulation and 
analytical data as n grows, this seems to be reasonable 
only for values of n even larger than for the previous r 
values. 

All the simulation data in Fig. [4] are also normalized, 
but now to n-Q CC , since the greatest load value a node 
may have grows quadratically with the number of nodes 
|2l| . These data are plotted against power laws of expo- 
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FIG. 5: (Color online) Load distribution for 7igcc = 904 and 

T = 2. 

nent —2.2, which is the exponent that in [1[ is conjectured 
to be universal with respect to r for large n. And in fact 
the agreement of these power laws with the simulation 
data seems good for r < 2.5, as in these cases GCC sizes 
have a relatively high mean and low spread. However, 
unlike the case of the descent distributions, normalizing 
and binning the raw simulation data for the load distri- 
butions has the deleterious effect of masking important 
information that is present in the raw data and allows 
nodal-load distributions to be interpreted in terms of the 
underlying descents. 

This is illustrated in Fig. [5l where the raw simulation 
data are shown for r = 2 but restricted to graphs hav- 
ing tigcc = 904, where 904 is the observed mean GCC 
size. What we see in this figure is a succession of sharply 
defined probability peaks. The first peak occurs for a 
load value of 1807, the second one for 3 611, the third 
for 5 413, and so on. If we examine these numbers in the 
light of Eq. ([1]), which expresses a node's load as the sum 
of its descents in the nccc distinct BFS trees, then they 
can be explained as follows: 

• The first peak's location can be decomposed as 
1 807 = 904 x 1 + 1 x 903, and therefore accounts 
for those nodes whose descent is 904 in exactly one 
tree (this happens for every node and corresponds 
to the tree rooted at it) and 1 in all the remaining 

903 trees (of which they are leaves). These, clearly, 
are all degree- 1 nodes. Note also that the trees in 
which they have descent 1 constitute the near to- 
tality of the trees. 

• The location of the second peak can be similarly 
decomposed, for example as 3 611 = 904 x 1 + 903 x 
1 + 2 x 902, referring to those nodes whose descent is 

904 in the tree rooted at it, is 903 in one other tree, 
and 2 in the remaining 902 trees. There may exist 
degree-2 nodes that conform to this arrangement of 
descents, but this is no longer necessary. Also, now 
it is the trees in which these nodes have descent 2 



that constitute the overwhelming majority of the 
trees. 

• For the third peak, we can write 5 413 = 904 x 1 + 
903 x 2 + 3 x 901, now referring to nodes that have 
descent 904 in the tree where it is root, 903 in two 
other trees, and 3 in the remaining 901 trees. Once 
again it is possible, though not necessary, for this 
arrangement to refer to degree-3 nodes. Continuing 
the trend established by the previous two cases, the 
trees in which they have descent 3 are by far the 
most numerous. 

This same pattern of "diophantine" decomposition can 
be applied to the subsequent peaks and, although the cor- 
respondence to node degrees beyond 1 is not guaranteed, 
we see that peak locations tend to become chiefly deter- 
mined by the descents which, from our previous analyses, 
we know are the most frequently occurring: 1, then 2, 
then 3, etc. 

As for larger values of r, we remark that the same 
type of behavior can also be observed, provided r is suf- 
ficiently small for GCC sizes to be relatively large and 
concentrated around the mean. 



V. CONCLUDING REMARKS 

We have considered the load of nodes in scale-free net- 
works and have studied its distribution from the perspec- 
tive of expressing a node's load in terms of the node's 
descents in all BFS (or shortest-distance) trees in the 
graph. We have characterized the descent distribution 
semi- analytically by resorting to a generating-function 
formalism and to simulated data on the distribution of 
immediate BFS descendants. We then studied the distri- 
bution of nodal load, but through computer simulations 
only (analytical work in this case would require indepen- 
dence assumptions that we found to be too strong). 

Our results have allowed us to revisit the results of Q 
on the load distribution, particularly the conjecture that 
such a distribution is a power law whose exponent does 
not depend on r (i.e., is independent of the underlying 
graph's degree distribution in the scale- free case). The 
purported universal exponent of the load distribution is 
—2.2, and indeed we have been able to confirm that such 
an exponent seems satisfactorily accurate for large net- 
works after data have been conveniently normalized and 
binned. 

Looking at the raw data, however, reveals that the 
load distribution is richly structured in a way that can 
be understood precisely by resorting to the characteriza- 
tion of nodal load in terms of descents in BFS trees. In 
our view, this discovery indicates that nodal load is not 
power-law-distributed and that the conjecture of a uni- 
versal exponent makes, after all, little sense. Of course, 
the origin of the previously accepted conclusion and con- 
jecture seems to have been the mishandling of data by 
inappropriate binning. This, along with other pitfalls of 
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a similar nature, is often the source of inaccurate data 
interpretation [17] . 

We note, finally, that studying quantities like descents 
in trees and nodal load is well aligned with what we 
think should be the predominating direction in complex- 
network investigations. The overwhelming majority of 
network studies so far have concentrated primarily on 
structural notions of a predominantly local nature (e.g., 
node-degree distributions). Descents and loads, on the 
other hand, are examples of structural notions of a more 
global nature and, for this very reason, their study con- 



stitutes an important step toward complex-network re- 
search that emphasizes the networks' functional, rather 
than structural, properties. 
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